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(54) Tide: VOICE USER INTERFACE WITH PERSONALITY 
(57) Abstract 

The present invention provides a voice user 
interface with personality. In one embodiment, a 
method includes executing a voice user interface, and 
controlling the voice user interface to provide the voice 
user interface with a personality. The method includes 
selecting a prompt based on various context situations, 
such as a previously selected prompt and the user's 
experience with using the voice user interface. 
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VOICE USER INTERFACE WITH PERSONALITY 

TECHNICAL FIELD 

The present invention relates generally to user interfaces and, more particularly, to a voice user 
interface with personality. 

BACKGROUND ART 

Personal computers (PCs), sometimes referred to as micro-computers, have gained widespread use in 
recent years, primarily, because they are inexpensive and yet powerful enough to handle computationally- 
intensive applications. PCs typically include graphical user interfaces (GUIs). Users interact with and control 
an application executing on a PC using a GUI. For example, the Microsoft WINDOWS™ Operating System 
(OS) represents an operating system that provides a GUI. A user controls an application executing on a PC 
running the Microsoft WINDOWS™ OS using a mouse to select menu commands and click on and move 
icons. 

The increasingly powerful applications for computers have led to a growing use of computers for 
various computer telephony applications. For example, voice mail systems are typically implemented using 
software executing on a computer that is connected to a telephone line for storing voice data signals 
transmitted over the telephone line. A user of a voice mail system typically controls the voice mail system 
using dual tone multiple frequency (DTMF) commands and, in particular, using a telephone keypad to select 
the DTMF commands available. For example, a user of a voice mail system typically dials a designated voice 
mail telephone number, and the user then uses keys of the user's telephone keypad to select various commands 
of the voice mail system's command hierarchy. Telephony applications can also include a voice user interface 
that recognizes speech signals and outputs speech signals. 

DISCLOSURE OF INVENTION 

The present invention provides a voice user interface with personality. For example, the present 

invention provides a cost-effective and high performance computer-implemented voice user interface with 
personality that can be used for various applications in which a voice user interface is desired such as 
telephony applications. 
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In one embodiment, a method includes executing a voice user interface, and controlling the voice 
user interface to provide the voice user interface with a personality. A prompt is selected among various 
prompts based on various criteria. For example, the prompt selection is based on a prompt history. 
Accordingly, this embodiment provides a computer system that executes a voice user interface with 
personality. 

In one embodiment, controlling the voice user interface includes selecting a smooth hand-off prompt 
to provide a smooth hand-off between a first voice and a second voice of the voice user interface, selecting 
polite prompts such that the voice user interface behaves consistently with social and emotional norms, 
including politeness, while interacting with a user of the computer system, selecting brief negative prompts in 
situations in which negative comments are required, and selecting a lengthened prompt or shortened prompt 
based on a user's experience with the voice user interface. 

In one embodiment, controlling the voice user interface includes providing the voice user interface 
with multiple personalities. The voice user interface with personality installs a prompt suite for a particular 
personality from a prompt repository that stores multiple prompt suites, in which the multiple prompt suites 
are for different personalities of the voice user interface with personality. 

Other aspects and advantages of the present invention will become apparent from the following 
detailed description and accompanying drawings. 

BRIEF DESCRIPTION OF DRAWINGS 

FIG. 1 is a block diagram of a voice user interface with personality in accordance with one 

embodiment of the present invention. 

FIG. 2 is a block diagram of a voice user interface with personality that includes multiple 
personalities in accordance with one embodiment of the present invention. 

FIG. 3 is a flow diagram illustrating a process for implementing a computer-implemented voice user 
interface with personality in accordance with one embodiment of the present invention. 

FIG. 4 is a block diagram of the computer-implemented voice user interface with personality of FIG. 
1 shown in greater detail in accordance with one embodiment of the present invention. 

FIG. 5 is a block diagram of the personality engine of FIG. 1 shown in greater detail in accordance 
with one embodiment of the present invention. 



WO 99/577 1 4 PCTAJS99/09290 

-3 - 

FIG. 6 is a flow diagram of the operation of the negative comments rules of the personality engine of 
FIG. 5 in accordance with one embodiment of the present invention. 

FIG. 7 is a flow diagram of the operation of the politeness rules of the personality engine of FIG. 5 in 
accordance with one embodiment of the present invention. 
5 FIG. 8 is a flow diagram of the operation of the multiple voices rules of the personality engine of . 

FIG. 5 in accordance with one embodiment of the present invention. 

FIG. 9 is a block diagram of a voice user interface with personality for an application in accordance 
with one embodiment of the present invention. 

FIG. 10 is a functional diagram of a dialog interaction between the voice user interface with 
10 personality and a subscriber in accordance with one embodiment of the present invention. 

FIG. 1 1 is a flow diagram of the operation of the voice user interface with personality of FIG. 10 
during an interaction with a subscriber in accordance with one embodiment of the present invention. 

FIG. 12 provides a command specification of a modify appointment command for the system of FIG. 
9 in accordance with one embodiment of the present invention. 
15 FIGs. 13A and 13B are a flow diagram of a" dialog for a modify appointment command between the 

voice user interface with personality of FIG. 10 and a subscriber in accordance with one embodiment of the 
present invention. 

FIG. 14 shows a subset of the dialog for the modify appointment command of the voice user interface 
with personality of FIG. 10 in accordance with one embodiment of the present invention. 
20 FIG. 15 provides scripts written for a mail domain of the system of FIG. 9 in accordance with one 

embodiment of the present invention. 

FIG. 16 is a flow diagram for selecting and executing a prompt by the voice user interface with 
personality of FIG. 10 in accordance with one embodiment of the present invention. 

FIG. 17 is a block diagram of a memory that stores recorded prompts in accordance with one 
25 embodiment of the present invention. 

FIG. 18 is a finite state machine diagram of the voice user interface with personality of FIG. 10 in 
accordance with one embodiment of the present invention. 
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FIG. 19 is a flow diagram of the operation of the voice user interface with personality of FIG. 10 
using a recognition grammar in accordance with one embodiment of the present invention. 

MODES FOR CARRYING OUT THE INVENTION 

The present invention provides a voice user interface with personality. The term "personality" as 

5 used in the context of a voice user interface can be defined as the totality of spoken language characteristics 
that simulate the collective character, behavioral, temperamental, emotional, and mental traits of human beings 
in a way that would be recognized by psychologists and social scientists as consistent and relevant to a 
particular personality type. For example, personality types include the following: friendly-dominant, friendly- 
submissive, unfriendly-dominant, and unfriendly-submissive. Accordingly, a computer system that interacts 

10 with a user (e.g., over a telephone) and in which it is desirable to offer a voice user interface with personality 
would particularly benefit from the present invention. 

A Voice User Interface With Personality 

FIG. 1 is a block diagram of a voice user interface with personality in accordance with one 

15 embodiment of the present invention. FIG. 1 includes a computer system 100. Computer system 100 includes 
a memory 101 (e.g., volatile and non-volatile memory) and a processor 105 (e.g., an Intel PENTIUM™ 
microprocessor), and computer system 100 is connected to a standard display 116 and a standard keyboard 
118. These elements are those typically found in most general purpose computers, and in fact, computer 
system 100 is intended to be representative of a broad category of data processing devices. Computer system 

20 100 can also be in communication with a network (e.g., connected to a LAN). It will be appreciated by one of 
ordinary skill in the art that computer system 100 can be part of a larger system. 

Memory 101 stores a voice user interface with personality 103 that interfaces with an application 
106. Voice user interface with personality 103 includes voice user interface software 102 and a personality 
engine 104. Voice user interface software 102 is executed on processor 105 to allow user 1 12 to verbally 

25 interact with application 106 executing on computer system 100 via a microphone and speaker 1 14. Computer 
system 100 can also be controlled using a standard graphical user interface (GUI) (e.g., a Web browser) via 
keyboard 1 18 and monitor 1 16. 
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Voice user interface with personality 103 uses a dialog to interact with user 1 12. Voice user interface 
with personality 103 interacts with user 1 12 in a manner that gives user 1 12 the impression that voice user 
interface with personality 103 has a personality. The personality of voice user interface with personality 103 
is generated using personality engine 1 04, which controls the dialog output by voice user interface software 
5 1 02 during interactions with user 1 1 2. For example, personality engine 1 04 can implement any application- 
specific, cultural, politeness, psychological, or social rules and norms that emulate or model human verbal 
behavior (e.g., providing varied verbal responses) such that user 1 12 receives an impression of a voice user 
interface with a personality when interacting with computer system 100. Accordingly, voice user interface 
with personality 103 executed on computer system 100 provides a computer-implemented voice user interface 
1 0 with personality. 

FIG. 2 is a block diagram of a voice user interface with personality that includes multiple 
personalities in accordance with one embodiment of the present invention. FIG. 2 includes a computer system 
200, which includes a memory 201 (e.g., volatile and non-volatile memory) and a processor 21 1 (e.g., an Intel 
PENTIUM™ microprocessor). Computer system 200 can be a standard computer or any data processing 

1 5 device. It will be appreciated by one of ordinary skill in the art that computer system 200 can be part of a 

v.* 

larger system. 

Memory 201 stores a voice user interface with personality 203, which interfaces with an application 
211 (e.g., a telephony application that provides a voice mail service). Voice user interface with personality 
203 includes voice user interface software 202. Voice user interface with personality 203 also includes a 

20 personality engine 204. Personality engine 204 controls voice user interface software 202 to provide a voice 
user interface with a personality. For example, personality engine 204 provides a friendly-dominant 
personality that interacts with a user using a dialog of friendly directive statements (e.g., statements that are 
spoken typically as commands with few or no pauses). 

Memory 201 also stores a voice user interface with personality 205, which interfaces with application 

25 211. Voice user interface with personality 205 includes voice user interface software 208. Voice user 

interface with personality 205 also includes a personality engine 206. Personality engine 206 controls voice 
user interface software 208 to provide a voice user interface with a personality. For example, personality 
engine 206 provides a friendly-submissive personality that interacts with a user using a dialog of friendly but 
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submissive statements (e.g., statements that are spoken typically as questions and with additional explanation 
or pause). 

User 212 interacts with voice user interface with personality 203 executing on computer system 200 
using a telephone 214 that is in communication with computer system 200 via a network 215 (e.g., a telephone 
5 line). User 218 interacts with voice user interface with personality 205 executing on computer system 200 
using a telephone 216 that is in communication with computer system 200 via network 215. 



An Overview of an Implementation of a Computer-Implemented Voice User Interface With Personality 

FIG. 3 is a flow diagram illustrating a process for implementing a computer-implemented voice user 
10 interface with personality in accordance with one embodiment of the present invention. 

At stage 300, market requirements are determined. The market requirements represent the desired 

application functionality of target customers or subscribers for a product or service, which includes a voice 

user interface with personality. 

At stage 302, application requirements are defined. Application requirements include functional 
15 requirements of a computer-implemented system that will interact with users using a voice user interface with 

personality. For example, application requirements include various functionality such as voice mail and 

electronic mail (email). The precise use of the voice user interface with personality within the system is also 

determined. 

At stage 304, a personality is selected. The personality can be implemented as personality engine 
20 104 to provide a voice user interface 102 with personality. For example, a voice user interface with 
personality uses varied responses to interact with a user. 

In particular, those skilled in the art of, for example, social psychology review the application 
requirements, and they then determine which personality types best serve the delivery of a voice user interface 
for the functions or services included in the application requirements. A personality or multiple personalities 
25 are selected, and a complete description is created of a stereotypical person displaying the selected personality 
or personalities, such as age, gender, education, employment history, and current employment position. 
Scenarios are developed for verbal interaction between the stereotypical person and typical users. 
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At stage 306, an actor is selected to provide the voice of the selected personality. The selection of an 
actor for a particular personality is further discussed below. 

At stage 308, a dialog is generated based on the personality selected at stage 304. The dialog 
represents the dialog that the voice user interface with personality uses to interact with a user at various levels 

5 within a hierarchy of commands of the system. For example, the dialog can include various greetings that are 
output to a user when the user logs onto the system. In particular, based on the selected personality, the 
dialogs are generated that determine what the computer-implemented voice user interface with personality can 
output (e.g., say) to a user to start various interactions, and what the computer-implemented voice user 
interface with personality can output to respond to various types of questions or responses in various situations 

1 0 during interactions with the user. 

At stage 310, scripts are written for the dialog based on the selected personality. For example, scripts 
for a voice user interface with personality that uses varied responses can be written to include varied greetings, 
which can be randomly selected when a user logs onto the system to be output by the voice user interface with 
personality to the user. During stage 310, script writers, such as professional script writers who would 

15 typically be writing for television programs or movies, are given the dialogs generated during stage 308 and 
instructed to re-write the dialogs using language that consistently represents the selected personality. 

At stage 3 12, the application is implemented. The application is implemented based on the 
application requirements and the dialog. For example, a finite state machine can be generated, which can then 
be used as a basis for a computer programmer to efficiently and cost-effectively code the voice user interface 

20 with personality. In particular, a finite state machine is generated such that all functions specified in the 

application requirements of the system can be accessed by a user interacting with the computer-implemented 
voice user interface with personality. The finite state machine is then coded in a computer language that can 
be compiled or interpreted and then executed on a computer such as computer system 100. For example, the 
finite state machine can be coded in "C" code and compiled using various C compilers for various computer 

25 platforms (e.g., the Microsoft WINDOWS™ OS executing on an Intel X86™/PENTIUM™ microprocessor). 
The computer programs are executed by a data processing device such as computer system 100 and thereby 
provide an executable voice user interface with personality. For example, commercially available tools 
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provided by ASR vendors such as Nuance Corporation of Menlo Park, CA, can be used to guide software 
development at stage 3 1 8. 

Stage 314 determines whether the scripted dialog can be practically and efficiently implemented for 
the voice user interface with personality of the application. For example, if the scripted dialog cannot be 
5 practically and efficiently implemented for the voice user interface with personality of the application (e.g., by 
failing to collect from a user of the application a parameter that is required by the application), then the dialog 
is refined at stage 308. 

At stage 316, the scripts (e.g., prompts) are recorded using the selected actor. The scripts are read by 
the actor as directed by a director in a manner that provides recorded scripts of the actor's voice reflecting 
1 0 personality consistent with the selected personality. For example, a system that includes a voice user interface 
with personality, which provides a voice user interface with a friendly-dominant personality would have the 
speaker speak more softly and exhibit greater pitch range than if the voice user interface had a friendly- 
submissive personality. 

At stage 3 1 8, a recognition grammar is generated. The recognition grammar specifies a set of 
1 5 commands that a voice user interface with personality can understand when spoken by a user. For example, a 
computer-implemented system that provides voice mail functionality can include a recognition grammar that 
allows a user to access voice mail by saying "get my voice mail", "do I have any voice mail", and "please get 
me my voice mail". Also, if the voice user interface with personality includes multiple personalities, then each 
of the personalities of the voice user interface with personality may include a unique recognition grammar. 
20 In particular, commercially available speech recognition systems with recognition grammars are 

provided by ASR (Automatic Speech Recognition) technology vendors such as the following: Nuance 
Corporation of Menlo Park, CA; Dragon Systems of Newton, MA; IBM of Austin, TX; Kurzweil Applied 
Intelligence of Waltham, MA; Lemout Hauspie Speech Products of Burlington, MA; and PureSpeech, Inc. of 
Cambridge, MA. Recognition grammars are written specifying what sentences and phrases are to be 
25 recognized by the voice user interface with personality (e.g., in different states of the finite state machine). 

For example, a recognition grammar can be generated by a computer scientist or a computational linguist or a 
linguist The accuracy of the speech recognized ultimately depends on the selected recognition grammars. 
For example, recognition grammars that permit too many alternatives can result in slow and inaccurate ASR 
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performance. On the other hand, recognition grammars that are too restrictive can result in a failure to 
encompass a users' input. In other words, users would either need to memorize what they could say or be 
faced with a likely failure of the ASR system to recognize what they say as the recognition grammar did not 
anticipate the sequence of words actually spoken by the user. Thus, crafting of recognition grammars can 
5 often be helped by changing the prompts of the dialog. A period of feedback is generally helpful in tabulating 
speech recognition errors such that recognition grammars can be modified and scripts modified as well as help 
generated in order to coach a user to say phrases or commands that are within the recognition grammar. 



A Computer-Implemented Voice User Interface With Personality 

10 FIG. 4 is a block diagram of the computer-implemented voice user interface with personality of FIG. 

1 shown in greater detail in accordance with one embodiment of the present invention. FIG. 4 includes 
computer system 100 that executes voice user interface software 102 that is controlled by personality engine 
104. Voice user interface software 102 interfaces with an application 410 (e.g., a telephony application). 
Computer system 100 can be a general purpose computer such as a personal computer (PC). For example, 

15 computer system 100 can be a PC that includes an Intel PENTIUM™ running the Microsoft WINDOWS 95™ 
operating system (OS) or the Microsoft WINDOWS NT™ OS. £ 

Computer system 100 includes telephone line cards 402 that allow computer system 100 to 
communicate with telephone lines 413. Telephone lines 413 can be analog telephone lines, digital Tl lines, 
digital T3 lines, or OC3 telephony feeds. For example, telephone line cards 402 can be commercially 

20 available telephone line cards with 24 lines from Dialogic Corporation of Parsippany, NJ, or commercially 
available telephone line cards with 2 to 48 lines from Natural MicroSystems Inc. of Natick, MA. Computer 
system 100 also includes a LAN (Local Area Network) connector 403 that allows computer system 100 to 
communicate with a network such as a LAN or Internet 404, which uses the well-known TCP/IP 
(Transmission Control Protocol/Internet Protocol). For example, LAN card 403 can be a commercially 

25 available LAN card from 3COM Corporation of Santa Clara, California. The voice user interface with 

personality may need to access various remote databases and, thus, can reach the remote databases via LAN or 
Internet 404. Accordingly, the network, LAN or Internet 404, is integrated into the system, and databases 
residing on remote servers can be accessed by voice user interface software 102 and personality engine 104. 
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Users interact with voice user interface software 102 over telephone lines 413 through telephone line 
cards 402 via speech input data 405 and speech output data 412. For example, speech input data 405 can be 
coded as 32-kilobit ADPCM (Adaptive Differential Pulse Coded Modulation) or 64-KB MU-law parameters 
using commercially available modulation devices from Rockwell International of Newport Beach, 
CA. 

Voice user interface software 102 includes echo cancellation software 406. Echo cancellation 
software 406 removes echoes caused by delays in the telephone system or reflections from acoustic waves in 
the immediate environment of the telephone user such as in an automobile. Echo cancellation software 406 is 
commercially available from Noise Cancellation Technologies of Stamford, CN. 

Voice user interface software 102 also includes barge-in software 407. Barge-in software detects 
speech from a user in contrast to ambient background noise. When speech is detected, any speech output from 
computer system 100 such as via speech output data 412 is shut off at its source in the software so that the 
software can attend to the new speech input The effect observed by a user (e.g., a telephone caller) is the 
ability of the user to interrupt computer system 100 generated speech simply by talking. Barge-in software 
407 is commercially available from line card manufacturers and ASR technology suppliers such as Dialogic 
Corporation of Parsippany, NJ, and Natural Microsystems Inc. of Natick, MA. Barge-in increases an 
individual's sense that they are interacting with a voice user interface with personality. 

Voice user interface software 102 also includes signal processing software 408. Speech recognizers 
typically do not operate directly on time domain data such as ADPCM. Accordingly, signal processing 
software 408 performs signal processing operations, which result in transforming speech into a series of 
frequency domain parameters such as standard cepstral coefficients. For example, every 10 milliseconds, a 
twelve-dimensional vector of cepstral coefficients is produced to model speech input data 405. Signal 
processing software 408 is commercially available from line card manufacturers and ASR technology 
suppliers such as Dialogic Corporation of Parsippany, NJ, and Natural MicroSystems Inc. of Natick, MA. 

Voice user interface software 102 also includes ASR/NL software 409. ASR/NL software 409 
performs automatic speech recognition (ASR) and natural language (NL) speech processing. For example, 
ASR/NL software is commercially available from the following companies: Nuance Corporation of Menlo 
Park, CA, as a turn-key solution; Applied Language Technologies, Inc. of Boston, MA; Dragon Systems of 
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Newton, MA; and PureSpeech, Inc. of Cambridge, MA. The natural language processing component can be 
obtained separately as commercially available software products from UNISYS Corporation of Blue Bell, PA. 
The commercially available software typically is modified for particular applications such as a computer 
telephony application. For example, the voice user interface with personality can be modified to include a 
5 customized grammar, as further discussed below. 

Voice user interface software 102 also includes TTS/recorded speech output software 41 1 . Text-to- 
speech(TTS)/recorded speech output software 41 1 provides functionality that enables computer system 100 to 
talk (e.g., output speech via speech output data 412) to a user of computer system 100. For example, if the 
information to be communicated to the user or the caller originates as text such as an email document, then 

1 0 TTS software 4 1 1 speaks the text to the user via speech output data 4 1 2 over telephone lines 413. For 

example, TTS software is commercially available from the following companies: AcuVoice, Inc. of San Jose, 
CA; Centigram Communications Corporation of San Jose, CA; Digital Equipment Corporation (DEC) of 
Maynard, MA; Lucent Technologies of Murray Hill, NJ; and Entropic Research Laboratory, Inc. of Menlo 
Park, CA. TTS/recorded speech software 41 1 also allows computer system 100 to output recorded speech 

15 (e.g., recorded prompts) to the user via speech output data 412 over telephone lines 413. For example, several 
thousand recorded prompts can be stored in memory 101 of computer system 100 (e.g., as part of personality 
engine 104) and played back at any appropriate time, as further discussed below. Accordingly, the variety and 
personality provided by the recorded prompts and the context sensitivity of the selection and output of the 
recorded prompts by personality engine 104 provides a voice user interface with personality implemented in 

20 computer system 100. 

Application 410 is in communication with a LAN or the Internet 404. For example, application 410 
is a telephony application that provides access to email, voice mail, fax, calendar, address book, phone book, 
stock quotes, news, and telephone switching equipment Application 410 transmits a request for services that 
can be served by remote computers using the well-known TCP/IP protocol over LAN or the Internet 404. 

25 Accordingly, voice user interface software 102 and personality engine 104 execute on computer 

system 100 (e.g., execute on a microprocessor such as an Intel PENTIUM™ microprocessor) to provide a 
voice user interface with personality that interacts with a user via telephone lines 413. 
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Personality Engine 

FIG. 5 is a block diagram of the personality engine of FIG. 1 shown in greater detail in accordance 
with one embodiment of the present invention. Personality engine 104 is a rules-based engine for controlling 
voice user interface software 102. 
5 Personality engine ] 04 implements negative comments rules 502, which are further discussed below 

with respect to FIG. 6. Personality engine 104 also implements politeness rules 504, which are further 
discussed below with respect to FIG. 7. Personality engine 104 implements multiple voices rules 506, which 
are further discussed below with respect to FIG. 8. Personality engine 104 also implements expert/novice 
rules 508, which include rules for controlling the voice user interface in situations in which the user learns 

10 over time what the system can do and thus needs less helpful prompting. For example, expert/novice rules 
508 control the voice user interface such that the voice user interface outputs recorded prompts of an 
appropriate length (e.g., detail) depending on a particular user's expertise based on the user's current session 
and based on the user's experience across sessions (e.g., personality engine 104 maintains state information 
for each user of computer system 100). Accordingly, personality engine 104 executes various rules that direct 

15 the behavior of voice user interface software 102 while interacting with users of the system in order to create 
an impression upon the user that voice user interface with personality 103 has a personality. 

FIG. 6 is a flow diagram of the operation of negative comments rules 502 of personality engine 104 
of FIG. 5 in accordance with one embodiment of the present invention. Negative comments rules 502 include 
rules that are based on social-psychology empirical observations that (i) negative material is generally more 

20 arousing than positive material, (ii) people do not like others who criticize or blame, and (iii) people who 
blame themselves are seen and viewed as less competent Accordingly, FIG. 6 is a flow diagram of the 
operation of negative comments rules 502 that implements these social-psychology empirical observations in 
accordance with one embodiment of the present invention. 

At stage 602, it is determined whether a negative comment is currently required (i.e., whether voice 

25 user interface software 102 is at a stage of interaction with a user at which voice user interface software 102 
needs to provide some type of negative comment to the user). If so, operation proceeds to stage 604. 

At stage 604, it is determined whether there has been a failure (i.e., whether the negative comment is 
one that reports a failure). If so, operation proceeds to stage 606. Otherwise, operation proceeds to stage 608. 



Diierwini. ,\*rr\ 



WO 99/57714 PCT/US 99/09290 

At stage 606, a prompt (e.g., a recorded prompt) that briefly states the problem or blames a third 
party is selected. This state the problem or blame a third party rule is based on a social-psychology empirical 
observation that when there is a failure, a system should neither blame the user nor take blame itself, but 
instead the system should simply state the problem or blame a third party. For example, at stage 606, a 
5 recorded prompt that states the problem or blames a third party is selected, such as "there seems to be a 

problem in getting your appointments for today" or "the third-party news service is not working right now" to 
the user. 

At stage 608, the volume is lowered for audio data output to the user, such as speech output data 412, 
for the subsequent negative comment (e.g., recorded prompt) to be uttered by recorded speech software 41 1 of 
10 voice user interface software 102. This lower the volume rule is based on a social-psychology empirical 
observation that negative comments should generally have a lower volume than positive comments. 

At stage 610, a brief comment (e.g., outputs a brief recorded prompt) is selected to utter as the 
negative comment to the user. This brief comment rule is based on a social-psychology empirical observation 
that negative comments should be shorter and less elaborate than positive comments. 
15 FIG. 7 is a flow diagram of the operation «f politeness rules 504 of personality engine 104 of FIG. 5 

in accordance with one embodiment of the present invention. Politeness rules 504 include rules that are based 
on Grice's maxims for politeness as follows: the quantity that a person should say during a dialog with z 
another person should be neither more nor less than is needed, comments should be relevant and apply to the 
previous conversation, comments should be clear and comprehensible, and comments should be correct in a 
20 given context Accordingly, FIG. 7 is a flow diagram of the operation of politeness rules 504 that implements 
Grice's maxims for politeness in accordance with one embodiment of die present invention. 

At stage 702, it is determined whether help is required or requested by the user. If so, operation 
proceeds to stage 704. Otherwise, operation proceeds to stage 706. 

At stage 704, it is determined whether the user is requiring repeated help in the same session or 
25 across sessions (i.e., a user is requiring help more than once in the current session). If so, operation proceeds 
to stage 712. Otherwise, operation proceeds to stage 710. 

At stage 706, it is determined whether a particular prompt is being repeated in the same session (i.e., 
the same session with a particular user) or across sessions. If so, operation proceeds to stage 708. At stage 
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708, politeness rules 504 selects a shortened prompt (e.g., selects a shortened recorded prompt) for output by 
voice user interface software 102. This shortened prompt rule is based on a social-psychology empirical 
observation that the length of prompts should become shorter within a session and across sessions, unless the 
user is having trouble, in which case the prompts should become longer (e.g., more detailed). 

At stage 712, a lengthened help explanation (e.g., recorded prompt) is selected for output by voice 
user interface software 102. For example, the lengthened help explanation can be provided to a user based on 
the user's help requirements in the current session and across sessions (e.g., personality engine 104 maintains 
state information for each user of computer system 100). This lengthened help rule is based on a social- 
psychology empirical observation that help explanations should get longer and more detailed both within a 
session and across sessions. 

At stage 710, a prompt that provides context-sensitive help is selected for output by voice user 
interface software 102. For example, the context-sensitive help includes informing the user of the present 
state of the user's session and available options (e.g., an explanation of what the user can currently instruct the 
system to do at the current stage of operation). This context-sensitive help rule is based on a social- 
psychology empirical observation that a system should provide the ability to independently request, in a 
context-sensitive way, any of the following: available options, the present state of the system, and an 
explanation of what the user can currently instruct the system to do at the current stage of operation. 

In one embodiment, a prompt is selected for output by voice user interface software 102, in which the 
selected prompt includes terms that are recognized by voice user interface with personality 103 (e.g., within 
the recognition grammar of the voice user interface with personality). This functionality is based on the 
social-psychology empirical observation that it is polite social behavior to use words introduced by the other 
person (in this case the voice user interface with personality) in conversation. Thus, this functionality is 
advantageous, because it increases the probability that a user will interact with voice user interface with 
personality 103 using words that are recognized by the voice user interface with personality. Politeness rules 
504 can also include a rule that when addressing a user by name, voice user interface with personality 103 
addresses the user by the user's proper name, which generally represents a socially polite manner of 
addressing a person (e.g., a form of flattery). 
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Another social-psychology empirical observation that can be implemented by politeness rules 504 
and executed during the operation of politeness rules 504 appropriately is that when there is a trade-off 
between technical accuracy and comprehensibility, voice user interface with personality 103 should choose the 
latter. Yet another social-psychology empirical observation that can be implemented by politeness rules 504 
5 and executed during the operation of politeness rules 504 appropriately is that human beings generally speak 
using varied responses (e.g., phrases) while interacting in a dialog with another human being, and thus, 
politeness rules 504 include a rule for selecting varied responses (e.g., randomly select among multiple 
recorded prompts available for a particular response) for output by voice user interface software 102. 

FIG. 8 is a flow diagram of the operation of multiple voices rules 506 of personality engine 104 of 

10 FIG. 5 in accordance with one embodiment of the present invention. Multiple voices rules 506 include rules 
that are based on the following social-psychology theories: different voices should be different social actors, 
disfluencies in speech are noticed, and disfluencies make the speakers seem less intelligent. Accordingly, 
FIG. 8 is a flow diagram of the operation of multiple voices rules 506 that implement these social-psychology 
theories in accordance with one embodiment of die present invention. 

15 At stage 802, it is determined whether two^voices are needed by voice user interface with personality 

103 while interacting with a user. If two voices are desired, then operation proceeds to stage 804. ^ 

At stage 804, a smooth hand-off prompt is selected, which provides a smooth hand-off between the 
two voices to be used while interacting with the user. For example, a smooth hand-off is provided between 
the recorded voice output by the recorded speech software and the synthesized voice output by the TTS 

20 software. For example, voice user interface with personality 103 outputs "I will have your email read to you" 
to provide a transition between the recorded voice of recorded speech software 41 1 and the synthesized voice 
of TTS software 411. This smooth hand-off rule is based on a social-psychology empirical observation that 
there should be a smooth transition from one voice to another. 

At stage 806, prompts are selected for output by each voice such that each voice utters an 

25 independent sentence. For each voice, an appropriate prompt is selected that is an independent sentence, and 
each voice then utters the selected prompt, respectively. For example, rather than outputting "[voice 1] Your 
email says [voice 2] voice user interface with personality 103 outputs "I will have your email read to you" 
using the recorded voice of recorded speech software 41 1, and voice user interface with personality 103 
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outputs "Your current email says ..." using the synthesized voice of TTS software 41 1. This independent 
sentences rule is based on a social-psychology empirical observation that two different voices should not utter . 
different parts of the same sentence. 

The personality engine can also implement various rules for a voice user interface with personality to 

5 invoke elements of team affiliation. For example, voice user interface with personality 103 can invoke team 
affiliation by outputting recorded prompts that use pronouns such as "we" rather than "you" or "I" when 
referring to tasks to be performed or when referring to problems during operation of the system. This concept 
of team affiliation is based on social-psychology empirical observations that indicate that a user of a system is 
more likely to enjoy and prefer using the system if the user feels a team affiliation with the system. For 

10 example, providing a voice user interface with personality that invokes team affiliation is useful and 

advantageous for a subscriber service, in which the users are subscribers of a system that provides various 
services, such as the system discussed below with respect to FIG. 9. Thus, a subscriber will likely be more 
forgiving and understanding of possible problems that may arise during use of the system, and hence, more 
likely to continue to be a subscriber of the service if the subscriber enjoys using the system through in part a 

15 team affiliation with the voice user interface with personality of the system. 

The above discussed social-psychology empirical observations are further discussed and supported in 
The Media Equation, written by Byron Reeves and Clifford Nass, and published by CSLI Publications (1996). 



A Voice User Interface With Personality For An Application 

20 FIG. 9 is a block diagram of a voice user interface with personality for an application in accordance 

with one embodiment of the present invention. System 900 includes a voice user interface with personality 
103 shown in greater detail in accordance with one embodiment of the present invention. System 900 
includes an application 902 that interfaces with voice user interface with personality 103. 

Voice user interface with personality 103 can be stored in a memory of system 900. Voice user 

25 interface with personality 103 provides the user interface for application 902 executing on system 900 and 

interacts with users (e.g., subscribers and contacts of the subscribers) of a service provided by system 900 via 
input data signals 904 and output data signals 906. 
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Voice user interface with personality 1 03 represents a run-time version of voice user interface with 
personality 103 that is executing on system 900 for a particular user (e.g., a subscriber or a contact of the 
subscriber). Voice user interface with personality 103 receives input data signals 904 that include speech 
signals, which correspond to commands from a user, such as a subscriber. The voice user interface with 

5 personality recognizes the speech signals using a phrase delimiter 908, a recognizer 9 1 0, a recognition 
manager 912, a recognition grammar 914, and a recognition history 916. Recognition grammar 914 is 
installed using a recognition grammar repository 920, which is maintained by application 902 for all 
subscribers of system 900. Recognition history 916 is installed or uninstalled using a recognition history 
repository 918, which is maintained by application 902 for all of the subscribers of system 900. Input data 

1 0 signals 904 are received at phrase delimiter 908 and then transmitted to recognizer 910. Recognizer 9 1 0 

extracts speech signals from input data signals 904 and transmits the speech signals to recognition manager 
912. Recognition manager 912 uses recognition grammar 914 and recognition history 916 to recognize a 
command that corresponds to the speech signals. The recognized command is transmitted to application 902. 
Voice user interface with personality 103 outputs data signals that include voice signals, which 

15 correspond to greetings and responses to the subscriber. The voice user interface with personality generates 
the voice signals using a player & synthesizer 922, a prompt manager 924, a pronunciation generator 926, a 
prompt suite 928, and a prompt history 930. Prompt suite 928 is installed using a prompt suite repository 932, 
which is maintained by application 902 for all of the subscribers of system 900. Prompt history 930 is 
installed or uninstalled using a prompt history repository 934, which is maintained by application 902 for all 

20 of the subscribers of system 900. Application 902 transmits a request to prompt manager 924 for a generic 
prompt to be output to the subscriber. Prompt manager 924 determines the interaction state using interaction 
state 936. Prompt manager 924 then selects a specific prompt (e.g., one of multiple prompts that correspond 
to the generic prompt) from a prompt suite 928 based on a prompt history stored in prompt history 930. 
Prompt manager 924 transmits the selected prompt to player and synthesizer 922. Player and synthesizer 

25 plays a recorded prompt or synthesizes the selected prompt for output via output data signals 906 to the 
subscriber. 

The voice user interface with personality also includes a barge-in detector 938. Barge-in detector 938 
disables output data signals 906 when input data signals 904 are detected. 
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For example, recognition grammar 914 includes the phrases that result from the scripting and 
recording of dialog for a virtual assistant with a particular personality. A phrase is anything that a user can say 
to the virtual assistant that the virtual assistant will recognize as a valid request or response. The grammar 
organizes the phrases into contexts or domains to reflect that the phrases the virtual assistant recognizes may 
depend upon the state of the user's interactions with the virtual assistant. Each phrase has both a specific 
name and a generic name. Two or more phrases (e.g., "Yes" and "Sure") can share the same generic name but 
not the same specific name. All recognition grammars define the same generic names but not necessarily the 
same specific names. Two recognition grammars can include different numbers of phrases and so define 
different numbers of specific names. 

While a recognition grammar is created largely at design time, at run-time therapplication can 
customize the recognition grammar for the subscriber (e.g., with the proper names of his or her contacts). 
Pronunciation generator 926 allows for custom pronunciations for custom phrases and, thus, a subscriber- 
specific grammar. For example, pronunciation generator 926 is commercially available from Nuance 
Corporation of Menlo Park, CA. 

Recognition history 916 maintains the subscriber's experience with a particular recognition grammar. 
Recognition history 916 includes the generic and specific names of the phrases in the recognition grammar 
and the number of times the voice user interface with personality has heard the user say each phrase. 

In one embodiment, application 902 allows the subscriber to select a virtual assistant that provides a 
voice user interface with a particular personality and which includes a particular recognition grammar. 
Application 902 preserves the selection in a non-volatile memory. To initialize the virtual assistant for a 
session with the subscriber or one of the subscribers contacts, application 902 installs the appropriate 
recognition grammar 914. When initializing the virtual assistant, application 902 also installs the subscriber's 
recognition history 916. For the subscriber's first session, an empty history is installed. At the end of each 
session with the subscriber, application 902 uninstalls and preserves the updated history, recognition history 
916. 

The voice user interface with personality recognizes input data signals 904, which involves 
recognizing the subscriber's utterance as one of the phrases stored in recognition grammar 914, and updating 
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recognition history 916 and interaction state 936 accordingly. The voice user interface with personality 
returns the generic and specific names of the recognized phrase. 

In deciding what the subscriber says, the voice user interface with personality considers not only 
recognition grammar 914, but also both recognition history 916, which stores the phrases that the subscriber 
5 has previously stated to the virtual assistant, and prompt history 930, which stores the prompts that the virtual 
assistant previously stated to the subscriber. 

Prompt suite 928 includes the prompts that result from the scripting and recording of a virtual 
assistant with a particular personality. A prompt is anything that the virtual assistant can say to the subscriber. 
Prompt suite 928 includes synthetic as well as recorded prompts. A recorded prompt is a recording of a 

10 human voice saying the prompt, which is output using player and synthesizer 922. A synthetic prompt is a 

written script for which a voice is synthesized when the prompt is output using player and synthesizer 922. A 
synthetic prompt has zero or more formal parameters for which actual parameters are substituted when the 
prompt is played. For example, to announce the time, application 902 plays "It's now <time>", supplying the 
current time. The script and its actual parameters may give pronunciations for the words included in the 

15 prompt Prompt suite 928 may be designed so thara user attributes the recorded prompts and synthetic 

prompts (also referred to as speech markup) to different personae (e.g., the virtual assistant and her helper, 
respectively). Each prompt includes both a specific name (e.g., a specific prompt) and a generic name (e.g., a 
specific prompt corresponds to a generic prompt, and several different specific prompts can correspond to the 
generic prompt). Two or more prompts (e.g., "Yes" and "Sure") can share the same generic name but not the 

20 same specific name. All suites define the same generic names but not necessarily the same specific names. 
Two prompt suites can include different numbers of prompts and, thus, define different numbers of specific 
names. 

For example, prompt suite 928 includes the virtual assistant's responses to the subscriber's explicit 
coaching requests. These prompts share a generic name. There is one prompt for each possible state of the 
25 virtual assistant's interaction with the user. 

Although prompt suite 928 is created at design time, at run-time application 902 can customize 
prompt suite 928 for the subscriber (e.g., with the proper names of the subscriber's contacts using 
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pronunciation generator 926 to generate pronunciations for custom synthetic prompts). Thus, prompt suite 
928 is subscriber-specific. 

Prompt history 930 documents the subscriber's experience with a particular prompt suite. Prompt 
history 930 includes the generic and specific names of the prompts stored in prompt suite 928 and how often 
the voice user interface with personality has played each prompt for the subscriber. 

In one embodiment, application 902 allows the subscriber to select a virtual assistant and, thus, a 
voice user interface with a particular personality that uses a particular prompt suite. Application 902 preserves 
the selection in non-volatile memory. To initialize the selected virtual assistant for a session with the 
subscriber or a contact of the subscriber, application 902 installs the appropriate prompt suite. When 
initializing the virtual assistant, application 902 also installs the subscriber's prompt history 930. For the 
subscriber's first session, application 902 installs an empty history. At the end of each session, application 
902 uninstalls and preserves the updated history. 

Application 902 can request that the voice user interface with personality play for the user a generic 
prompt in prompt suite 928. The voice user interface with personality selects a specific prompt mat 
corresponds to the generic prompt in one of several" ways, some of which require a clock (not shown in FIG. 
9) or a random number generator (not shown in FIG. 9), and updates prompt history 930 accordingly. For 
example, application 902 requests mat the voice user interface with personality play a prompt that has a 
generic name (e.g., context-sensitive coaching responses), or application 902 requests that the voice user 
interface with personality play a prompt that has a particular generic name (e.g., that of an affirmation). In 
selecting a specific prompt that corresponds to the generic prompt, the voice user interface with personality 
considers both prompt history 930 (i.e., what the virtual assistant has said to the subscriber) and recognition 
history 916 (what the user has said to the virtual assistant). In selecting a specific prompt, the voice user 
interface with personality selects at random (e.g., to provided varied responses) one of two or more equally 
favored specific prompts. 

Prompt suite 928 includes two or more greetings (e.g., "Hello", "Good Morning", and "Good 
Evening"). The greetings share a particular generic name. Application 902 can request that the voice user 
interface with personality play one of the prompts with the generic name for the greetings. The voice user 
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interface with personality selects among the greetings appropriate for the current time of day (e.g., as it would 
when playing a generic prompt). 

Prompt suite 928 includes farewells (e.g., "Good-bye" and "Good night"). The farewell prompts 
share a particular generic name. Application can request that the voice user interface with personality play 

5 one of the prompts with the generic name for the farewells. The voice user interface with personality selects, 
among the farewells appropriate for the current time of day. 

Application 902 can request that the voice user interface with personality play a prompt that has a 
particular generic name (e.g., a help message for a particular situation) and to select a prompt that is longer in 
duration than the previously played prompts. In selecting the longer prompt, the voice user interface with 

1 0 personality consults prompt history 930. 

Application 902 can request that the voice user interface with personality play a prompt that has a 
particular generic name (e.g., a request for information from the user) and to select a prompt that is shorter in 
duration than the previously played prompts. In selecting the shorter prompt, the voice user interface with 
personality consults prompt history 930. 

15 Application 902 can request that the voice user interface with personality play a prompt (e.g., a joke) 

at a particular probability and, thus, the voice user interface with personality sometimes plays nothing. 

Application 902 can request mat the voice user interface with personality play a prompt (e.g., a 
remark that the subscriber may infer as critical) at reduced volume. 

Application 902 can request mat the voice user interface with personality play an approximation 

20 prompt. An approximation prompt is a prompt output by the virtual assistant so that the virtual assistant is 

understood by the subscriber, at the possible expense of precision. For example, an approximation prompt for 
the current time of day can approximate the current time to the nearest quarter of an hour such that the virtual 
assistant, for example, informs the subscriber that the current time is "A quarter past four P.M ." rather than 
overwhelming the user with the exact detailed time of "4: 1 1 :02 PM*\ 

25 In one embodiment, application 902 provides various functionality including an email service, a 

stock quote service, a news content service, and a voice mail service. Subscribers access a service provided by 
system 900 via telephones or modems (e.g., using telephones, mobile phones, PDAs, or a standard computer 
executing a WWW browser such as the commercially available Netscape NAVIGATOR™ browser). System 
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900 allows subscribers via telephones to collect messages from multiple voice mail systems, scan voice 
messages, and manipulate voice n ^ . ^ages (e.g., delete, save, skip, and forward). System 900 also allows 
subscribers via telephones to receive notification of email messages, scan email messages, read email 
messages, respond to email messages, and compose email messages. System 900 allows subscribers via 
telephones to setup a calendar, make appointments and to-do lists using a calendar, add contacts to an address 
book, find a contact in an address book, call a contact in an address book, schedule a new appointment in a 
calendar, search for appointments, act upon a found appointment, edit to-do lists, read to-do lists, and act upon 
to-do lists. System 900 allows subscribers via telephones to access various WWW content. System 900 
allows subscribers to access various stock quotes. Subscribers can also customize the various news content, 
email content, voice mail content, and WWW content that system 900 provides to the subscriber. 

System 900 advantageously includes a voice user interface with personality that acts as a virtual 
assistant to a subscriber of the service. For example, the subscriber can customize the voice user interface 
with personality to access and act upon the subscriber's voice mail, email, faxes, pages, personal information 
manager (PIM), and calendar (CAL) information through both a telephone and a WWW browser (e.g., the 
voice user interface with personality is accessible via the subscriber's mobile phone or telephone by dialing a 
designated phone number to access the service). 

In one embodiment, the subscriber selects from several different personalities when selecting a virtual 
assistant For example, the subscriber can interview virtual assistants with different personalities in order to 
choose the voice user interface with a personality that is best suited for the subscriber's needs, business, or the 
subscriber's own personality. A subscriber who is in a sales field may want an aggressive voice user interface 
with personality that puts incoming calls through, but a subscriber who is an executive may want a voice user 
interface with personality that takes more of an active role in screening calls and only putting through 
important calls during business hours. Thus, the subscriber can select a voice user interface with a particular 
personality. 

As discussed above, to further the perception of true human interaction, the virtual assistant responds 
with different greetings, phrases, and confirmations just as a human assistant. For example, some of these 
different greetings are related to a time of day (e.g., "good morning" or "good evening"). Various humorous 
interactions are included to add to the personality of the voice user interface, as further discussed below. 
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There are also different modes for the voice user interface with personality throughout the service. These 
different modes of operation are based on a social-psychology empirical observation that while some people 
like to drive, others prefer to be driven. Accordingly, subscribers can have the option of easily switching from 
a more verbose learning mode to an accelerated mode that provides only the minimum prompts required to 
5 complete an action. 

Dialog 

FIG. 10 is a functional diagram of a dialog interaction between a voice user interface with personality 
1002 (e.g., voice user interface with personality 103) and a subscriber 1004 in accordance with one 
10 embodiment of the present invention. When subscriber 1004 logs onto a system that includes voice user 
interface with personality 1002, such as system 900, voice user interface with personality 1002 provides a 
greeting 1006 to subscriber 1004. For example, greeting 1006 can be a prompt that is selected based on the 
current time of day. 

Voice user interface with personality 1002 then interacts with subscriber 1004 using a dialog 1008, 
15 which gives subscriber 1004 the impression that the" voice user interface of the system has a personality. 

If subscriber 1004 selects a particular command provided by the system such as by speaking a., 
command that is within the recognition grammar of voice user interface with personality 1002, then the. , 
system executes the command selection as shown at execute operation 1010. 

Before subscriber 1004 logs off of the system, voice user interface with personality 1002 provides a 
20 farewell 1012 to subscriber 1004. For example, farewell 1012 can be a prompt that is selected based on the 
current time of day. 

FIG. 1 1 is a flow diagram of the operation of voice user interface with personality 1002 of FIG. 10 
during an interaction with a subscriber in accordance with one embodiment of the present invention. At stage 
1 102, voice user interface with personality 1002 determines whether a recorded prompt needs to be output to 
25 the subscriber. If so, operation proceeds to stage 1 104. 

At stage 1 104, voice user interface with personality 1002 determines whether there is a problem (e.g., 
the user is requesting to access email, and the email server of the system is down, and thus, unavailable). If 
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so, operation proceeds to stage 1 106. Otherwise, operation proceeds to stage 1 108. At stage 1 106, voice user 
interface with personality 1 002 executes negative comments rules (e.g., negative comments rules 502). 

At stage 1 108, voice user interface with personality 1002 determines whether multiple voices are 
required at this stage of operation during interaction with the subscriber (e.g., the subscriber is requesting that 
5 an email message be read to the subscriber, and TTS software 41 1 uses a synthesized voice to read the text of 
the email message, which is a different voice than the recorded voice of recorded speech software 41 1). If so, 
operation proceeds to stage 1110. Otherwise, operation proceeds to stage 1112. At stage 1110, voice user 
interface with personality 1002 executes multiple voices rules (e.g., multiple voices rules 506). 

At stage 1 1 12, voice user interface with personality 1002 executes politeness rules (e.g., multiple 
10 voices rules 504). At stage 1114, voice user interface with personality 1002 executes expert/novice rules (e.g., 
expert/novice rules 508). At stage 1116, voice user interface with personality 1002 outputs the selected 
prompt based on the execution of the appropriate rules. 

As discussed above with respect to FIG. 9, system 900 includes functionality such as calendar 
functionality that, for example, allows a subscriber of system 900 to maintain a calendar of appointments. In 
1 5 particular, the subscriber can modify an appointment previously scheduled for the subscriber's calendar. 

FIG. 12 provides a command specification of a modify appointment command for system 900 in 
accordance with one embodiment of the present invention. FIG. 12 shows the command syntax of the modify 
appointment command, which is discussed above. For example, a subscriber can command voice user 
interface with personality 1002 (e.g., the subscriber command the application through voice user interface 
20 with personality 1002) to modify an appointment by stating, "modify an appointment on June 13 at 3 p.m 

The command syntax of FIG. 12 provides a parse of the modify appointment command as follows: "modify" 
represents the command, "appointment" represents the object of the command, "date" represents option 1 of 
the command, and "time" represents option2 of the command. The subscriber can interact with voice user 
interface with personality 1002 using a dialog to provide a command to the system to modify an appointment. 
25 FIGs. 1 3 A and 1 3B are a flow diagram of a dialog for a modify appointment command between 

voice user interface with personality 1002 and a subscriber in accordance with one embodiment of the present 
invention. The dialog for the modify appointment command implements the rules that provide a voice user 
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interface with personality, as discussed above (e.g., negative comments rules 502, politeness rules 504, 
multiple voices rules 506, and expert/novice rules 508 of personality engine 104). 

Referring to FIG. 13 A, at stage 1302, voice user interface with personality 1002 recognizes a modify 
appointment command spoken by a subscriber. At stage 1304, voice user interface with personality 1002 
5 confirms with the subscriber an appointment time to be changed. 

At stage 1306, voice user interface with personality 1002 determines whether the confirmed 
appointment time to be changed represents the right appointment to be modified. If so, operation proceeds to 
stage 1312. Otherwise, operation proceeds to stage 1308. At stage 1308, voice user interface with personality 
1002 informs the subscriber that voice user interface with personality 1002 needs the correct appointment to 
10 be modified, in other words, voice user interface with personality 1002 needs to determine the start time of the 
appointment to be modified. At stage 1310, voice user interface with personality 1002 determines the start 
time of the appointment to be modified (e.g., by asking the subscriber for the start time of the appointment to 
be modified). 

At stage 1312, voice user interface with personality 1002 determines what parameters to modify of 
15 the appointment. At stage 1 314, voice user interface with personality 1002 determines whether the 

appointment is to be deleted. If so, operation proceeds to stage 1316, and the appointment is deleted. % 
Otherwise, operation proceeds to stage 1318. At stage 1318, voice user interface with personality 1 002 
determines whether a new date is needed, in other words, to change the date of the appointment to be 
modified. If so, operation proceeds to stage 1320, and the date of the appointment is modified. Otherwise, 
20 operation proceeds to stage 1322. At stage 1322, voice user interface with personality 1002 determines 
whether a new start time is needed. If so, operation proceeds to stage 1324, and the start time of the 
appointment is modified. Otherwise, operation proceeds to stage 1326. At stage 1326, voice user interface 
with personality 1002 determines whether a new duration of the appointment is needed. If so, operation 
proceeds to stage 1 328, and the duration of the appointment is modified. Otherwise, operation proceeds to 
25 stage 1330. At stage 1330, voice user interface with personality 1002 determines whether a new invitee name 
is needed. If so, operation proceeds to stage 1332. Otherwise, operation proceeds to stage 1334. At stage 
1332, voice user interface with personality 1002 determines the new invitee name of the appointment. 
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Referring to FIG. 13B, at stage 1336, voice user interface with personality 1002 determines whether 
it needs to try the name again of the invitee to be modified. If so, operation proceeds to stage 1338 to 
determine the name of the invitee to be modified. Otherwise, operation proceeds to stage 1340. At stage 
1340, voice user interface with personality 1002 confirms the name of the invitee to be modified. At stage 
5 1342, the invitee name is modified. 

At stage 1334, voice user interface with personality 1002 determines whether a new event description 
is desired by the subscriber. If so, operation proceeds to stage 1344, and the event description of the 
appointment is modified appropriately. Otherwise, operation proceeds to stage 1346. At stage 1346, voice 
user interface with personality 1002 determines whether a new reminder status is desired by the subscriber. If 

10 so, operation proceeds to stage 1348, and the reminder status of the appointment is modified appropriately. 

FIG. 14 shows an excerpt of the dialog for the modify appointment command of voice user interface 
with personality 1002. As shown in FIG. 14, the dialog for the modify appointment command is 
advantageously organized and arranged in four columns. The first column (left-most column) represents the 
label column, which represents a label for levels within a flow of control hierarchy during execution of voice 

15 user interface with personality 1002. The second column (second left-most column) represents the column 
that indicates what the user says as recognized by voice user interface with personality 1002 (e.g., within the 
recognition grammar of voice user interface with personality 1002, as discussed below). The third column 
(third left-most column) represents the flow control column. The flow control column indicates the flow of 
control for the modify appointment command as executed by voice user interface with personality 1002 in 

20 response to commands and responses by the subscriber and any problems that may arise during the dialog for 
the modify appointment command. The fourth column (right-most column) represents what voice user 
interface with personality 1002 says (e.g., recorded prompts output) to the subscriber during the modify 
appointment dialog in its various stages of flow control. 

As shown in FIG. 14, the fourth column provides the dialog as particularly output by voice user 

25 interface with personality 1002. FIG. 14 also shows that voice user interface with personality 1002 has several 
options at various stages for prompts to play back to the subscriber. The dialog for the modify appointment 
command as shown in FIG. 14 is selected according to the rules that provide a voice user interface with 
personality, as discussed above. The four-column arrangement shown in FIG. 14 also advantageously allows 
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for the generation of dialogs for various commands of a system, such as system 900, that can then easily be 
programmed by a computer programmer to implement voice user interface with personality 1002. 

Script the Dialog 

Based on the functional specification of a system such as system 900, a dialog such as the dialog . 
specification discussed above, and in particular, a set of rules that define a voice user interface with 
personality such as the rules executed by personality engine 104, scripts are written for the dialog executed by 
voice user interface with personality 1002. 

FIG. 15 shows scripts written for a mail domain (e.g., voice mail functionality) of application 902 of 
system 900 in accordance with one embodiment of the present invention. The left column of the table of 
FIG. 15 indicates the location of the flow of control of operation of voice user interface with personality 1002 
within a particular domain (in this case the mail domain), in which the domains and flow of control of 
operation within domains are particularly specified in a finite state machine, as further discussed below. 

Thus, within the mail domain, and within the mail_top_navlist stage of flow control, voice user 
interface with personality 1002 can state any of seven prompts listed in the corresponding right column. For 
example, voice user interface with personality 1002 can select the first listed prompt and, thus, output to the 
subscriber, "What do you want me to do with your mail?". Voice user interface with personality 1002 can 
select the third listed prompt and then say to the subscriber, "Okay, mail's ready. How can I help you?". Or, 
voice user interface with personality 1002 can select the fifth listed prompt and, thus, output to the subscriber, 
M What would you like me to do?". 

The various prompts selected by voice user interface with personality 1002 obey the personality 
specification, as described above. For example, voice user interface with personality 1002 can select among 
various prompts for the different stages of flow control within a particular domain using personality engine 
104, and in particular, using negative comments rules 502, politeness rules 504, multiple voices rules 506, and 
expert/novice rules 508. 

Varying the selection of various prompts within a session and across sessions for a particular 
subscriber advantageously provides a more human-like dialog between voice user interface with personality 
1002 and the subscriber. Selection of various prompts can also be driven in part by a subscriber's selected 
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personality type for voice user interface with personality 1002. For example, if the subscriber prefers a voice 
user interface with personality 1002 that lets the subscriber drive the use of system 900 (e.g., the subscriber 
has a driver type of personality), then voice user interface with personality 1002 can be configured to provide 
a friendly-submissive personality and to select prompts accordingly. 
5 Voice user interface with personality 1002 can also use dialogs that include other types of 

mannerisms and cues that provide the voice user interface with personality, such as laughing to overcome an 
embarrassing or difficult situation. For example, within the mail domain and the gumailjeplyjecipient stage 
of flow control, the last listed prompt is as follows, "<Chuckle> This isn't going well, is it? Let's start over." 

The process of generating scripts can be performed by various commercially available services. For 
10 example, FunArts Software, Inc. of San Francisco, CA, can write the scripts, which inject personality into each 
utterance of voice user interface with personality 1002. 

Record the Dialog 

After writing the scripts for the dialog of voice user interface with personality 1002, the scripts are 
15 recorded and stored (e.g., in a standard digital format) in a memory such as memory 101). In one 

embodiment, a process of recording scripts involves directing voice talent, such as an actor or actress, to 
generate interactive media, such as the dialogs for voice user interface with personality 1002. 

First, an actor or actress is selected to read the appropriate scripts for a particular personality of voice 
user interface with personality 1002. The actor or actress is selected based upon their voice and their style of 
20 delivery. Then, using different timbres and pitch ranges that the actor or actress has available, a character 

voice for voice user interface with personality 1002 is generated and selected for each personality type. Those 
skilled in the art of directing voice talent will recognize that some of the variables to work with at this point 
include timbre, pitch, pace, pronunciation, and intonation. There is also an overall task of maintaining 
consistency within the personality after selecting the appropriate character voice. 
25 Second, the scripts are recorded. Each utterance (e.g., prompt that can be output by voice user 

interface with personality 1002 to the subscriber) can be recorded a number of different times with different 
reads by the selected actor or actress. The director maintains a detailed and clear image of the personality in 
his or her mind in order to keep the selected actor or actress "in character". Accordingly, maintaining a sense 
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of the utterances within all the possible flow of control options is another important factor to consider when 
directing non-linear interactive media, such as the recording of scripts for voice user interface with personality 
1002. For example, unlike narrative, non-linear interactive media, such as the dialog for voice user interface 
with personality 1002, does not necessarily have a predefined and certain path. Instead, each utterance works 
with a variety of potential pathways. User events can be unpredictable, yet the dialog spoken by voice user 
interface with personality 1002 should make sense at all times, as discussed above with respect to FIG. 7. 

A certain degree of flexibility and improvisation in the recording process may also be desirable as 
will be apparent to those skilled in the art of generating non-linear interactive media. However, this is a matter 
of preference for the director. Sometimes the script for an utterance can be difficult to pronounce or deliver in 
character and can benefit from a spur of the moment improvisation by the actor or actress. Often the short, 
character-driven responses that surround an utterance such as a confirmation can respond to the natural sounds 
of the specific actor. Creating and maintaining the "right" feeling for the actor is also important during the 
recording of non-linear media. Because the actor or actress is working in total isolation, without the benefit of 
other actors or actresses to bounce off of, or a coherent story line, and the actor or actress is often reading from 
an unavoidably technical script, it is important thafthe director maintain a close rapport with the selected actor 
or actress during recording and maintain an appropriate energy level during the recording process. ^ 

FIG. 16 is a flow diagram for selecting and executing a prompt by voice user interface with 
personality 1002 in accordance with one embodiment of the present invention. At stage 1602, voice user 
interface with personality 1002 determines whether or not a prompt is needed. If so, operation proceeds to 
stage 1604. At stage 1604, application 902 requests that voice user interface with personality outputs a 
generic prompt (e.g., provides a generic name of a prompt). 

At stage 1606, voice user interface with personality 1002 selects an appropriate specific prompt (e.g., 
a specific name of a prompt that corresponds to the generic name). A specific prompt can be stored in a 
memory, such as memory 101, as a recorded prompt in which different recordings of the same prompt 
represent different personalities. For example, voice user interface with personality 1002 uses a rules-based 
engine such as personality engine 104 to select an appropriate specific prompt. The selection of an 
appropriate specific prompt can be based on various factors, which can be specific to a particular subscriber, 
such as the personality type of voice user interface with personality 1002 configured for the subscriber and the 
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subscriber's expertise with using voice user interface with personality 1002. At stage 1608, voice user 
interface with personality outputs the selected specific prompt to the subscriber. 

FIG. 17 is a block diagram of a memory 1700 that stores recorded scripts in accordance with one 
embodiment of the present invention. Memory 1700 stores recorded scripts for the mail domain scripts of 

5 FIG. 15, and in particular, for the stage of flow of control of mail.top.navlist for various personality types, as 
discussed above. Memory 1700 stores recorded mail.top navlist scripts 1702 for a friendly-dominant 
personality, recorded mail. top.navlist scripts 1704 for a friendly-submissive personality, recorded mail.top 
navlist scripts 1706 for an unfriendly-dominant personality, and recorded mailjop navlist scripts 1708 for an 
unfriendly-submissive personality. 

10 In one embodiment, recorded mail.top navlist scripts 1702, 1704, 1706, andA708 can be stored 

within personality engine 104 (e.g., in prompt suite 928). Personality engine 104 selects an appropriate 
recorded prompt among recorded mail.top navlist scripts 1702, 1704, 1706, and 1708. The selection of 
recorded mail. top .navlist scripts 1702, 1704, 1706, and 1708 by personality engine 104 can be based on the 
selected (e.g., configured) personality for voice user interface with personality 1002 for a particular subscriber 

1 5 and based on previously selected prompts for the subscriber within a current session and across sessions (e.g., 
prompt history 930). For example, personality engine 104 can be executed on computer system 100 and 
during operation of the execution perform such operations as select prompt operation 1604 and select recorded 
prompt operation 1606. 

Hie process of recording scripts can be performed by various commercially available services. For 
20 example, FunArts Software, Inc. of San Francisco, CA, writes scripts, directs voice talent in reading the 
scripts, and edits the audio tapes of the recorded scripts (e.g., to adjust volume and ensure smooth audio 
transitions within dialogs). 

Finite State Machine Implementation 
25 Based upon the application of a system, a finite state machine implementation of a voice user 

interface with personality is generated. A finite state machine is generated in view of an application, such as 
application 902 of system 900, and in view of a dialog, such as dialog 1008 as discussed above. For a 
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computer-implemented voice user interface with personality, the finite state machine implementation should 
be generated in a manner that is technically feasible and practical for coding (programming). 

FIG. 18 is a finite state machine diagram of voice user interface with personality 1002 in accordance 
with one embodiment of the present invention. Execution of the finite state machine begins at a login and 
password state 1810 when a subscriber logs onto system 900. After a successful logon, voice user interface . 
with personality 1002 transitions to a main state 1800. Main state 1800 includes a time-out handler state 1880 
for time-out situations (e.g., a user has not provided a response within a predetermined period of time), a take- 
a-break state 1890 (e.g., for pausing), and a select domain state 1820. 

From select domain state 1820, voice user interface with personality 1002 determines which domain 
of functionality to proceed to next based upon a dialog (e.g., dialog 1008) with a subscriber. For example, the 
subscriber may desire to record a name, in which case, voice user interface with personality 1002 can 
transition to a record name state 1 830. When executing record name state 1830, voice user interface with 
personality 1002 transitions to a record name confirm state 1 840 to confirm the recorded name. If the 
subscriber desires to update a schedule, then voice user interface with personality 1002 can transition to an 
update schedule state 1850. From update schedule state 1850, voice user interface with personality 1002 
transitions to an update schedule confirm state 1 860 to confirm the update of the schedule. The subscriber can 
also request that voice user interface with personality 1002 read a schedule, in which case, voice user interface 
with personality 1002 transitions to a read schedule state 1870 to have voice user interface with personality 
1002 have a schedule read to the subscriber. 

Recognition Grammar 

Voice user interface with personality 1002 includes various recognition grammars mat represent the 
verbal commands (e.g., phrases) that voice user interface with personality 1002 can recognize when spoken by 
a subscriber. As discussed above, a recognition grammar definition represents a trade-off between accuracy 
and performance as well as other possible factors. It will be apparent to one of ordinary skill in the art of ASR 
technology that the process of defining various recognition grammars is usually an iterative process based on 
use and performance of a system, such as system 900, and voice user interface with personality 1002. 
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FIG. 19 is a flow diagram of the operation of voice user interface with personality 1002 using a 
recognition grammar in accordance with one embodiment of the present invention. At stage 1902, voice user # 
interface with personality 1002 determines whether or not a subscriber has issued (e.g., spoken) a verbal 
command. If so, operation proceeds to stage 1904. At stage 1904, voice user interface with personality 1002 
5 compares the spoken command to the recognition grammar. 

At stage 1906, voice user interface with personality 1002 determines whether there is a match 
between the verbal command spoken by the subscriber and a grammar recognized by voice user interface with 
personality 1002. If so, operation proceeds to stage 1908, and the recognized command is executed. 

In one embodiment, at stage 1904, voice user interface with personality 1002 use the recognition 
10 grammar to interpret the spoken command and, thus, combines stages 1904 and 1906, 

Otherwise, operation proceeds to stage 1910. At stage 1910, voice user interface with personality 
1002 requests more information from the subscriber politely (e.g., executing politeness rules 504). 

At stage 1912, voice user interface with personality 1002 determines whether or not there is a match 
between a recognition grammar and the verbal command spoken by the subscriber. If so, operation proceeds 
15 to stage 1908, and the recognized command is executed. 

Otherwise, operation proceeds to stage 1914. At stage 1914, voice user interface with personality 
1002 requests that the subscriber select among various listed command options that are provided at this point 
in the stage of flow of control of a particular domain of system 900. Operation then proceeds to stage 1908 
and the selected command is executed. 
20 Recognition grammars for a system such as system 900 can be defined in a grammar definition 

language (GDL) and the recognition grammars specified in GDL can then be automatically translated into 
machine executable grammars using commercially available software. For example, ASR software is 
commercially available from Nuance Corporation of Menlo Park, CA. 
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Computer Code Implementation 

Based on the finite state machine implementation, the selected personality, the dialog, and the 
recognition grammar (e.g., GDL), all discussed above, voice user interface with personality 1002 can be 
implemented in computer code that can be executed on a computer, such as computer system 100, to provide a 
system, such as system 900, with a voice user interface with personality, such as voice user interface with 
personality 1002. For example, the computer code can be stored as source code or compiled and stored as 
executable code in a memory, such as memory 101. 

Accordingly, the present invention provides a voice user interface with personality. For example, the 
present invention can be used to provide a voice user interface with personality for a telephone system that 
provides various functionality and services, such as an email service, a news content service, a stock quote 
service, and a voice mail service. A system that includes a voice user interface or interacts with users via 
telephones or mobile phones would significantly benefit from the present invention. 

Although particular embodiments of the present invention have been shown and described, it will be 
obvious to those skilled in the art that changes and modifications may be made without departing from the 
present invention in its broader aspects, and therefore, the appended claims are to encompass within their 
scope all such changes and modifications that fall within the true scope of the present invention. v 
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WE CLAIM: 



1 1 . An apparatus for a voice user interface with personality, the apparatus comprising: 

2 logic that provides a voice user interface, the voice user interface outputting first voice signals, and 

3 the voice user interface recognizing speech signals; and 

4 logic that provides a personality, the logic that provides the personality interfacing with the logic that 

5 provides the voice user interface to provide the voice user interface with personality. 

1 2. The apparatus as recited in Claim 1 wherein the logic that provides the voice user interface and the 

2 logic that provides the personality comprise computer software stored in a memory of a computer system, the 

3 computer software being executed on a processor of the computer system. 

1 3. The apparatus as recited in Claim 2 wherein the computer system comprises a microphone and 

2 speaker, the speech signals being received from the microphone, and the first voice signals being transmitted 

3 to the speaker. 

1 4. The apparatus as recited in Claim 2 wherein the computer system is in communication with a 

2 telephone, the speech signals being received from the telephone, and die first voice signals being transmitted 

3 to the telephone. 

1 5. The apparatus as recited in Claim 2 wherein the computer system comprises at least two 

2 computers. 

1 6. The apparatus as recited in Claim 1 wherein the voice user interface with personality comprises a 

2 virtual assistant with personality. 

1 7. The apparatus as recited in Claim 6 wherein the personality of the virtual assistant comprises a 

2 predetermined tone or cadence of the first voice signals, a predetermined vocabulary, a predetermined 

3 sentence construction, or a predetermined degree of assertiveness. 

1 8. The apparatus as recited in Claim 6 wherein die first voice signals comprise a prompt, the prompt 

2 being scripted for the personality of the virtual assistant. 

1 9. The apparatus as recited in Claim 6 wherein the first voice signals comprise a recorded prompt, 

2 the recorded prompt being acted and recorded for the personality of the virtual assistant. 

1 10. The apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 randomly selecting a prompt from multiple prompts available to provide a predetermined response. 
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1 11. The apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 selecting a prompt, the prompt comprising an appropriate temporal prompt. 

1 12. The apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 selecting a prompt, the prompt comprising a term that was previously spoken by the user in a recognized 

3 command. 

1 13. The apparatus as recited in Claim 1 wherein the logic that provides the voice user interface 

2 comprises recognizing multiple spoken commands as a predetermined command. 

1 14. The apparatus as recited in Claim 1 wherein the logic that provides the voice user interface 

2 comprises generating second voice signals, the second voice signals comprising synthesized voice signals that 

3 correspond to text. 

1 15. The apparatus as recited in Claim 1 wherein the logic that provides the personality selects a 

2 smooth hand-off prompt to provide a smooth hand-off between the first voice signals and second voice 

3 signals, the second voice signals being output by the logic that provides the voice user interface. 

1 16. The apparatus as recited in Claim 15 wherein the logic that provides the personality selects a first 

2 complete sentence for output by the first voice signals and a second complete sentence for output by the 

3 second voice signals. 

1 17. The apparatus as recited in Claim 16 wherein the first voice signals comprise the voice of a 

2 virtual assistant, and the second voice signals comprise the voice of a helper to the virtual assistant. 

1 18. The apparatus as recited in Claim 17 wherein the second voice signals comprise recorded voice 

2 signals. 

1 19. The apparatus as recited in Claim 17 wherein the second voice signals comprise synthesized 

2 voice signals. 

1 20. Hie apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 controlling the selection of the first voice signals output by the voice user interface such that the voice user 

3 interface with personality behaves consistently with social and emotional norms, including politeness, while 

4 interacting with the user. 
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1 2 1 . The apparatus as recited in Claim 20 wherein the first voice signals comprise a word that is 

2 within a recognition grammar of the voice user interface. 

1 22. The apparatus as recited in Claim 20 wherein the logic that provides the personality comprises 

2 controlling the selection of the first voice signals output by the voice user interface such that the voice user 

3 interface with personality addresses a user by a proper name of the user. 

1 23. Hie apparatus as recited in Claim 20 wherein the voice user interface with personality suggests 

2 responses available to the user that are polite and socially appropriate, thereby permitting the user to be polite 

3 and socially appropriate while interacting with the voice user interface with personality. 

1 24. The apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 controlling the voice user interface in situations in which negative comments are needed. 

1 25. The apparatus as recited in Claim 24 wherein the logic that provides the personality comprises 

2 selecting a negative prompt, the negative prompt comprising a concise and plain statement of a problem 

3 without blaming a user, and outputting a recording of the brief negative prompt at a lower volume. 

1 26. The apparatus as recited in Claim 1 wherein the logic mat provides the personality comprises 

2 selecting a prompt based on a user's experience with using the voice user interface during a current session 

3 and across sessions. 

1 27. The apparatus as recited in Claim 26 wherein the logic that provides the personality comprises 

2 selecting a shorter prompt based on the user's experience with using the voice user interface during the current 

3 session and across sessions. 

1 28. The apparatus as recited in Claim 26 wherein the logic that provides the personality comprises 

2 selecting a longer help prompt if the user's input indicates a problem with increasing frequency during the 

3 current session and across sessions. 

1 29. The apparatus as recited in Claim 26 wherein the logic that provides the personality comprises 

2 selecting a prompt that provides the available options to the user if the voice user interface does not recognize 

3 a command spoken by the user or if the user has not spoken for a defined period of time. 

1 30. The apparatus as recited in Claim 26 wherein the logic that provides the personality comprises 

2 selecting a coaching prompt that provides a current state of interaction, provides commands that the user can 
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3 say at the current state of interaction, and provides the actions that would be taken in response to each of the 

4 commands. 

1 31. The apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 selecting an approximation prompt. 

1 32. The apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 providing the voice user interface with a first personality and a second personality. 

1 33. The apparatus as recited in Claim 32 wherein the first voice signals comprise the voice of the 

2 first personality, and second voice signals comprise the voice of the second personality. 

1 34. The apparatus as recited in Claim 32 wherein a first virtual assistant comprises the first 

2 personality, and a second virtual assistant comprises the second personality. 

1 35. The apparatus as recited in Claim 34 wherein the first virtual assistant and the second virtual 

2 assistant comprise different genders. 

1 36. The apparatus as recited in Claim 34 wherein a user selects the first virtual assistant or the second 

2 virtual assistant based on descriptions of the first virtual assistant and the second virtual assistant or based on 

3 interacting with the first virtual assistant and the second virtual assistant. 

1 37. The apparatus as recited in Claim 1 wherein the logic that provides the personality comprises 

2 selecting a humorous prompt 

1 38. Hie apparatus as recited in Claim 1 wherein the voice user interface with personality comprises a 

2 virtual assistant for a voice-based desktop environment. 

1 39. The apparatus as recited in Claim 38 wherein the desktop environment comprises multiple 

2 objects, the virtual assistant being navigated among the multiple objects by a user. 

1 40. The apparatus as recited in Claim 1 further comprising: 

2 a recognition grammar stored in a memory, the recognition grammar comprising multiple phrases 

3 that a virtual assistant with a personality can recognize when spoken by a user, and the 

4 recognition grammar being selected based on the personality of the virtual assistant. 
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1 41. The apparatus as recited in Claim 2 wherein the computer system comprises: 

2 a telephone line card in communication with a telephone line for receiving speech input data from a 

3 user and speech output data from the logic that provides the voice user interface. 

1 42. The apparatus as recited in Claim 41 wherein the computer system further comprises: 

2 a network card, the network card in communication with a network. 

1 43. The apparatus as recited in Claim 1 wherein the logic that provides the voice user interface 

2 comprises: 

3 echo cancellation software, barge-in software, signal processing software, automatic speech 

4 recognition/natural language software, request for services software, and text-to- 

5 speech/recorded speech software. 

1 44. A method for a voice user interface with personality, the method comprising: 

2 executing a voice user interface, the voice user interface outputting first voice signals, the voice user 

3 interface recognizing speech signals; and 

4 controlling the voice user interface to provide the voice user interface whh a personality. 

1 45. The method as recited in Claim 44 wherein the voice user interface with personality comprises 

2 computer software stored in a memory of a computer system, the computer software being executed on a 

3 processor of the computer system. 

1 46. The method as recited in Claim 44 wherein the voice user interface with personality comprises a 

2 virtual assistant with personality. 

1 47. The method as recited in Claim 46 wherein the personality of the virtual assistant comprises a 

2 predetermined tone or cadence, a predetermined vocabulary, a predetermined sentence construction, or a 

3 predetermined degree of assertiveness. 

1 48. The method as recited in Claim 44 wherein the first voice signals comprise a prompt, the prompt 

2 being scripted for the personality of the voice user interface. 

1 49. The method as recited in Claim 44 wherein the first voice signals comprise a recorded prompt, 

2 the recorded prompt being acted and recorded for the personality of the voice user interface. 

1 50. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 randomly selecting a specific prompt that corresponds to a generic prompt. 
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1 51. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 selecting a prompt, the prompt comprising an appropriate temporal prompt. 

1 52. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 selecting a prompt to respond to a user, the prompt comprising a term that was previously spoken by the user 

3 in a recognized command. 

1 53. The method as recited in Claim 44 wherein the voice user interface comprises recognizing 

2 multiple spoken commands as a predetermined command. 

1 54. The method as recited in Claim 44 wherein the voice user interface comprises generating second 

2 voice signals, the second voice signals comprising synthesized voice signals that correspond to text. 

1 55. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 selecting a smooth hand-off prompt that provides a smooth hand-off between the first voice signals and 

3 second voice signals, the second voice signals being output by the voice user interface. 

1 56. The method as recited in Claim 55 wherein the controlling the voice user interface comprises 

2 selecting a first complete sentence for output by the first voice signals and a second complete sentence for 

3 output by the second voice signals. 

1 57. The method as recited in Claim 56 wherein the first voice signals comprise the voice of a virtual 

2 assistant and the second voice signals comprise the voice, of a helper to the virtual assistant 

1 58. The method as recited in Claim 57 wherein the second voice signals comprise recorded voice 

2 signals. 

1 59. The method as recited in Claim 57 wherein the second voice signals comprise synthesized voice 

2 signals. 

1 60. Hie method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 controlling the selection of the first voice signals output by the voice user interface such that the voice user 

3 interface behaves consistently with social and emotional norms, including politeness, while interacting with 

4 the user. 



1 

2 



61 . The method as recited in Claim 60 wherein the controlling the voice user interface comprises 
selecting a word that is within a recognition grammar to be output by the first voice signals. 
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1 62. The method as recited in Claim 60 wherein the controlling the voice user interface comprises 

2 controlling the selection of the first voice signals output by the voice user interface such that the voice user 

3 interface addresses a user by a proper name of the user. 

1 63. The method as recited in Claim 60 wherein the speech signals comprise user commands spoken 

2 politely. 

1 64. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 controlling the voice user interface in situations in which negative comments are needed. 

1 65. The method as recited in Claim 64 wherein the controlling the voice user interface comprises 

2 selecting a negative prompt, and outputting a recording of the negative prompt at a lower volume. 

1 66. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 selecting a prompt based on a user's experience with using the voice user interface during a current session 

3 and across sessions. 

1 67. Hie method as recited in Claim 66 wSerein the controlling the voice user interface comprises 

2 selecting a shorter prompt based on the user's experience with using the voice user interface during the current 

3 session and across sessions. 

1 68. The method as recited in Claim 66 wherein the controlling the voice user interface comprises 

2 selecting a longer help prompt if the user's input indicates a problem with increasing frequency during the 

3 current session and across sessions. 

1 69. The method as recited in Claim 66 wherein the controlling the voice user interface comprises 

2 selecting a prompt that lists the available options to the user if the voice user interface does not recognize a 

3 command spoken by the user or if the user has not spoken for a defined period of time. 

1 70. The method as recited in Claim 66 wherein the controlling the voice user interface comprises 

2 selecting a coaching prompt that provides a current state of interaction, lists commands that the user can say at 

3 the current state of interaction, and lists the actions that would be taken in response to each of the commands. 

1 71. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 selecting an approximation prompt. 
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1 72. The method as recited in Claim 44 wherein the controlling the voice user interface comprises 

2 providing the voice user interface with a first personality and a second personality. 

1 73. The method as recited in Claim 72 wherein the first voice signals comprise the voice of the first 

2 personality, and second voice signals comprise the voice of the second personality. 

1 74. The method as recited in Claim 72 wherein a first virtual assistant comprises the first personality, 

2 and a second virtual assistant comprises the second personality. 

1 75. The method as recited in Claim 74 wherein the first virtual assistant and the second virtual 

2 assistant comprise different genders. 

1 76. The method as recited in Claim 74 wherein a user selects the first virtual assistant or the second 

2 virtual assistant based on descriptions of the first virtual assistant and the second virtual assistant or based on 

3 interacting with the first virtual assistant and the second virtual assistant 

1 77. The method as recited in Claim 44 wherein the voice user interface with personality comprises 

2 selecting a humorous prompt 

'■V 

1 78. The method as recited in Claim 44 wherein the voice user interface with personality comprises a 

2 virtual assistant for a voice-based desktop environment 

1 79. The method as recited in Claim 78 wherein the desktop environment comprises multiple objects, 

2 the virtual assistant being navigated among the multiple objects by a user. 

1 80. The method as recited in Claim 44 further comprising: 

2 storing a recognition grammar in a memory, the recognition grammar comprising multiple phrases 

3 that the voice user interface can recognize when spoken by a user, and the grammar being 

4 selected based on the personality of the voice user interface. 

1 81 . The method as recited in Claim 44 wherein the voice user interface comprises: 

2 echo cancellation software, barge-in software, signal processing software, automatic speech 

3 recognition/natural language software, request for services software, and text-to- 

4 speech/recorded speech software. 



BNSDOCIH <Wn QQK77HIA1 I > 



WO 99/57714 



-42 - 



PCT/US99/09290 



1 82. A data signal in a carrier wave for a voice user interface with personality, the data signal in a 

2 carrier wave comprising: 

3 first voice signals, the first voice signals being output by a voice user interface with personality; and 

4 speech signals, the voice user interface with personality recognizing the speech signals. 

1 83. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 prompt, the prompt being scripted for the personality of the voice user interface. 

1 84. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 recorded prompt, the recorded prompt being acted and recorded for the personality of the voice user interface. 

1 85. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 prompt randomly selected from multiple prompts available to provide a predetermined response. 

1 86. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 prompt, the prompt comprising an appropriate temporal prompt. 

1 87. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 prompt to respond to a user, the prompt comprising a term that was previously spoken by the user in a 

3 recognized command. 

1 88. The data signal in a carrier wave as recited in Claim 82 wherein the speech signals comprise a 

2 phrase, the phrase being recognized by the voice user interface with personality as a predetermined command. 

1 89. The data signal in a carrier wave as recited in Claim 82 further comprising: 

2 second voice signals, the second voice signals being output by the voice user interface with 

3 personality. 

1 90. The data signal in a carrier wave as recited in Claim 89 further comprising: 

2 third voice signals, the third voice signals being output by the voice user interface with personality, 

3 the third voice signals comprising a smooth hand-off between the first voice signals and the 

4 second voice signals. 

1 91 . The data signal in a carrier wave as recited in Claim 89 wherein the first voice signals comprise i 

2 first complete sentence and the second voice signals comprise a second complete sentence. 
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1 92. The data signal in a carrier wave as recited in Claim 89 wherein the first voice signals comprise 

2 the voice of a virtual assistant, and the second voice signals comprise the voice of a helper to the virtual 

3 assistant. 

1 93. The data signal in a carrier wave as recited in Claim 89 wherein the second voice signals 

2 comprise synthesized voice signals. 

1 94. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 word that is within a recognition grammar of the voice user interface with personality. 

1 95. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 proper name of the user. 

1 96. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 negative prompt, the negative prompt comprising a concise and plain statement of a problem without blaming 

3 a user, and the first voice signals being output at a lower volume. 

1 97. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 prompt that is selected based on a user's experience with using the voice user interface with personality during 

3 a current session and across sessions. 

1 98. The data signal in a carrier wave as recited in Claim 97 wherein the first voice signals comprise a 

2 longer help prompt if the user's input indicates a problem with increasing frequency during the current session 

3 and across sessions. 

1 99. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise a 

2 coaching prompt that provides a current state of interaction, lists commands mat the user can say at the current 

3 state of interaction, and lists the actions that would be taken in response to each of the commands. 

1 100. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise 

2 an approximation prompt. 

1 101. The data signal in a carrier wave as recited in Claim 82 wherein the voice user interface with 

2 personality comprises a first personality and a second personality. 
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1 102. The data signal in a carrier wave as recited in Claim 101 wherein the first voice signals 

2 comprise the voice of the first personality, and second voice signals comprise the voice of the second 

3 personality, the second voice signals being output by the voice user interface with personality. 

1 103. The data signal in a carrier wave as recited in Claim 101 wherein a first virtual assistant 

2 comprises the first personality, and a second virtual assistant comprises the second personality. 

1 104. The data signal in a carrier wave as recited in Claim 82 wherein the first voice signals comprise 

2 a humorous prompt. 
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